{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using nbconvert as a library\n",
    "\n",
    "In this notebook, you will be introduced to the programmatic API of nbconvert and how it can be used in various contexts. \n",
    "\n",
    "A great [blog post](http://jakevdp.github.io/blog/2013/04/15/code-golf-in-python-sudoku/) by [\\@jakevdp](https://github.com/jakevdp) will be used to demonstrate.  This notebook will not focus on using the command line tool. The attentive reader will point-out that no data is read from or written to disk during the conversion process. This is because nbconvert has been designed to work in memory so that it works well in a database or web-based environment too."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quick overview"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Credit: Jonathan Frederic (@jdfreder on github)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The main principle of nbconvert is to instantiate an `Exporter` that controls the pipeline through which notebooks are converted.\n",
    "\n",
    "First, download @jakevdp's notebook (if you do not have `requests`, install it by running `pip install requests`, or if you don't have pip installed, you can find it on PYPI):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "from urllib.request import urlopen\n",
    "\n",
    "url = 'http://jakevdp.github.com/downloads/notebooks/XKCD_plots.ipynb'\n",
    "response = urlopen(url).read().decode()\n",
    "response[0:60] + ' ...'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The response is a JSON string which represents a Jupyter notebook. \n",
    "\n",
    "Next, we will read the response using nbformat. Doing this will guarantee that the notebook structure is valid. Note that the in-memory format and on disk format are slightly different. In particular, on disk, multiline strings might be split into a list of strings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "import nbformat\n",
    "jake_notebook = nbformat.reads(response, as_version=4)\n",
    "jake_notebook.cells[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The nbformat API returns a special type of dictionary. For this example, you don't need to worry about the details of the structure (if you are interested, please see the [nbformat documentation](https://nbformat.readthedocs.io/en/latest/)).\n",
    "\n",
    "The nbconvert API exposes some basic exporters for common formats and defaults. You will start by using one of them. First, you will import one of these exporters (specifically, the HTML exporter), then instantiate it using most of the defaults, and then you will use it to process the notebook we downloaded earlier."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "from traitlets.config import Config\n",
    "\n",
    "# 1. Import the exporter\n",
    "from nbconvert import HTMLExporter\n",
    "\n",
    "# 2. Instantiate the exporter. We use the `classic` template for now; we'll get into more details\n",
    "# later about how to customize the exporter further.\n",
    "html_exporter = HTMLExporter()\n",
    "html_exporter.template_name = 'classic'\n",
    "\n",
    "# 3. Process the notebook we loaded earlier\n",
    "(body, resources) = html_exporter.from_notebook_node(jake_notebook)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The exporter returns a tuple containing the source of the converted notebook, as well as a resources dict. In this case, the source is just raw HTML:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "print(body[:400] + '...')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you understand HTML, you'll notice that some common tags are omitted, like the `body` tag.  Those tags are included in the default `HtmlExporter`, which is what would have been constructed if we had not modified the `template_file`.\n",
    "\n",
    "The resource dict contains (among many things) the extracted `.png`, `.jpg`, etc. from the notebook when applicable.  The basic HTML exporter leaves the figures as embedded base64, but you can configure it to extract the figures.  So for now, the resource dict should be mostly empty, except for a key containing CSS and a few others whose content will be obvious:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "print(\"Resources:\", resources.keys())\n",
    "print(\"Metadata:\", resources['metadata'].keys())\n",
    "print(\"Inlining:\", resources['inlining'].keys())\n",
    "print(\"Extension:\", resources['output_extension'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Exporter`s are stateless, so you won't be able to extract any useful information beyond their configuration.  You can re-use an exporter instance to convert another notebook. In addition to the `from_notebook_node` used above, each exporter exposes `from_file` and `from_filename` methods."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extracting Figures using the RST Exporter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When exporting, you may want to extract the base64 encoded figures as files. While the HTML exporter does not do this by default, the `RstExporter` does:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# Import the RST exproter\n",
    "from nbconvert import RSTExporter\n",
    "# Instantiate it\n",
    "rst_exporter = RSTExporter()\n",
    "# Convert the notebook to RST format\n",
    "(body, resources) = rst_exporter.from_notebook_node(jake_notebook)\n",
    "\n",
    "print(body[:970] + '...')\n",
    "print('[.....]')\n",
    "print(body[800:1200] + '...')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that base64 images are not embedded, but instead there are filename-like strings, such as `output_3_0.png`.  The strings actually are (configurable) keys that map to the binary data in the resources dict.\n",
    "\n",
    "Note, if you write an RST Plugin, you are responsible for writing all the files to the disk (or uploading, etc...) in the right location.  Of course, the naming scheme is configurable.\n",
    "\n",
    "As an exercise, this notebook will show you how to get one of those images. First, take a look at the `'outputs'` of the returned resources dictionary. This is a dictionary that contains a key for each extracted resource, with values corresponding to the actual base64 encoding:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "sorted(resources['outputs'].keys())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this case, there are 5 extracted binary figures, all `png`s. We can use the Image display object to actually display one of the images:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "from IPython.display import Image\n",
    "Image(data=resources['outputs']['output_3_0.png'], format='png')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that this image is being rendered without ever reading or writing to the disk."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extracting Figures using the HTML Exporter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As mentioned above, by default, the HTML exporter does not extract images -- it just leaves them as inline base64 encodings. However, this is not always what you might want. For example, here is a use case from @jakevdp:\n",
    "\n",
    "> I write an [awesome blog](http://jakevdp.github.io/) using Jupyter notebooks converted to HTML, and I want the images to be cached.  Having one html file with all of the images base64 encoded inside it is nice when sharing with a coworker, but for a website, not so much.  I need an HTML exporter, and I want it to extract the figures!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Some theory\n",
    "\n",
    "Before we get into actually extracting the figures, it will be helpful to give a high-level overview of the process of converting a notebook to a another format:\n",
    "\n",
    "1. Retrieve the notebook and it's accompanying resources (you are responsible for this).\n",
    "2. Feed the notebook into the `Exporter`, which:\n",
    "    1. Sequentially feeds the notebook into an array of `Preprocessor`s.  Preprocessors only act on the **structure** of the notebook, and have unrestricted access to it. \n",
    "    2. Feeds the notebook into the Jinja templating engine, which converts it to a particular format depending on which template is selected.\n",
    "3. The exporter returns the converted notebook and other relevant resources as a tuple.\n",
    "4. You write the data to the disk using the built-in `FilesWriter` (which writes the notebook and any extracted files to disk), or elsewhere using a custom `Writer`.\n",
    "\n",
    "### Using different preprocessors\n",
    "\n",
    "To extract the figures when using the HTML exporter, we will want to change which `Preprocessor`s we are using. There are several preprocessors that come with nbconvert, including one called the `ExtractOutputPreprocessor`.\n",
    "\n",
    "The `ExtractOutputPreprocessor` is responsible for crawling the notebook, finding all of the figures, and putting them into the resources directory, as well as choosing the key (i.e. `filename_xx_y.extension`) that can replace the figure inside the template. To enable the `ExtractOutputPreprocessor`, we must add it to the exporter's list of preprocessors:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# create a configuration object that changes the preprocessors\n",
    "from traitlets.config import Config\n",
    "c = Config()\n",
    "c.HTMLExporter.preprocessors = ['nbconvert.preprocessors.ExtractOutputPreprocessor']\n",
    "\n",
    "# create the new exporter using the custom config\n",
    "html_exporter_with_figs = HTMLExporter(config=c)\n",
    "html_exporter_with_figs.preprocessors"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can compare the result of converting the notebook using the original HTML exporter and our new customized one:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "(_, resources)          = html_exporter.from_notebook_node(jake_notebook)\n",
    "(_, resources_with_fig) = html_exporter_with_figs.from_notebook_node(jake_notebook)\n",
    "\n",
    "print(\"resources without figures:\")\n",
    "print(sorted(resources.keys()))\n",
    "\n",
    "print(\"\\nresources with extracted figures (notice that there's one more field called 'outputs'):\")\n",
    "print(sorted(resources_with_fig.keys()))\n",
    "\n",
    "print(\"\\nthe actual figures are:\")\n",
    "print(sorted(resources_with_fig['outputs'].keys()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Custom Preprocessors"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are an endless number of transformations that you may want to apply to a notebook.  In particularly complicated cases, you may want to actually create your own `Preprocessor`. Above, when we customized the list of preprocessors accepted by the `HTMLExporter`, we passed in a string -- this can be any valid module name. So, if you create your own preprocessor, you can include it in that same list and it will be used by the exporter.\n",
    "\n",
    "To create your own preprocessor, you will need to subclass from `nbconvert.preprocessors.Preprocessor` and overwrite either the `preprocess` and/or `preprocess_cell` methods."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following demonstration adds the ability to exclude a cell by index. \n",
    "\n",
    "Note: injecting cells is similar, and won't be covered here. If you want to inject static content at the beginning/end of a notebook, use a custom template."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "from traitlets import Integer\n",
    "from nbconvert.preprocessors import Preprocessor\n",
    "\n",
    "class PelicanSubCell(Preprocessor):\n",
    "    \"\"\"A Pelican specific preprocessor to remove some of the cells of a notebook\"\"\"\n",
    "    \n",
    "    # I could also read the cells from nb.metadata.pelican if someone wrote a JS extension,\n",
    "    # but for now I'll stay with configurable value. \n",
    "    start = Integer(0,  help=\"first cell of notebook to be converted\").tag(config=True)\n",
    "    end   = Integer(-1, help=\"last cell of notebook to be converted\").tag(config=True)\n",
    "    \n",
    "    def preprocess(self, nb, resources):\n",
    "        self.log.info(\"I'll keep only cells from %d to %d\", self.start, self.end)\n",
    "        nb.cells = nb.cells[self.start:self.end]                    \n",
    "        return nb, resources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here a Pelican exporter is created that takes `PelicanSubCell` preprocessors and a `config` object as parameters.  This may seem redundant, but with the configuration system you can register an inactive preprocessor on all of the exporters and activate it from config files or the command line. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# Create a new config object that configures both the new preprocessor, as well as the exporter\n",
    "c =  Config()\n",
    "c.PelicanSubCell.start = 4\n",
    "c.PelicanSubCell.end = 6\n",
    "c.RSTExporter.preprocessors = [PelicanSubCell]\n",
    "\n",
    "# Create our new, customized exporter that uses our custom preprocessor\n",
    "pelican = RSTExporter(config=c)\n",
    "\n",
    "# Process the notebook\n",
    "print(pelican.from_notebook_node(jake_notebook)[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Programmatically creating templates"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "from jinja2 import DictLoader\n",
    "\n",
    "dl = DictLoader({'footer': \n",
    "\"\"\"\n",
    "{%- extends 'lab/index.html.j2' -%} \n",
    "\n",
    "{% block footer %}\n",
    "FOOOOOOOOTEEEEER\n",
    "{% endblock footer %}\n",
    "\"\"\"})\n",
    "\n",
    "\n",
    "exportHTML = HTMLExporter(extra_loaders=[dl], template_file='footer')\n",
    "(body, resources) = exportHTML.from_notebook_node(jake_notebook)\n",
    "for l in body.split('\\n')[-4:]:\n",
    "    print(l)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Real World Uses"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "@jakevdp uses Pelican and Jupyter Notebook to blog. Pelican [will use](https://github.com/getpelican/pelican-plugins/pull/21) nbconvert programmatically to generate blog post. Have a look a [Pythonic Preambulations](http://jakevdp.github.io/) for Jake's blog post."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "@damianavila wrote the Nikola Plugin to [write blog post as Notebooks](http://damianavila.github.io/blog/posts/one-line-deployment-of-your-site-to-gh-pages.html) and is developing a js-extension to publish notebooks via one click from the web app."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "<blockquote class=\"twitter-tweet\"><p>As <a href=\"https://twitter.com/Mbussonn\">@Mbussonn</a> requested... easieeeeer! Deploy your Nikola site with just a click in the IPython notebook! <a href=\"http://t.co/860sJunZvj\">http://t.co/860sJunZvj</a> cc <a href=\"https://twitter.com/ralsina\">@ralsina</a></p>&mdash; Damián Avila (@damian_avila) <a href=\"https://twitter.com/damian_avila/statuses/370306057828335616\">August 21, 2013</a></blockquote>\n",
    "</center>"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
